Grounding of Textual Phrases in Images by Reconstruction

机译：重构图像中文本短语的接地

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Grounding (i.e. localizing) arbitrary, free-form textual phrases in visualcontent is a challenging problem with many applications for human-computerinteraction and image-text reference resolution. Few datasets provide theground truth spatial localization of phrases, thus it is desirable to learnfrom data with no or little grounding supervision. We propose a novel approachwhich learns grounding by reconstructing a given phrase using an attentionmechanism, which can be either latent or optimized directly. During trainingour approach encodes the phrase using a recurrent network language model andthen learns to attend to the relevant image region in order to reconstruct theinput phrase. At test time, the correct attention, i.e., the grounding, isevaluated. If grounding supervision is available it can be directly applied viaa loss over the attention mechanism. We demonstrate the effectiveness of ourapproach on the Flickr 30k Entities and ReferItGame datasets with differentlevels of supervision, ranging from no supervision over partial supervision tofull supervision. Our supervised variant improves by a large margin over thestate-of-the-art on both datasets.

机译：在人机交互和图像-文本参考解析的许多应用中，使视觉内容中的任意自由格式文本短语接地（即本地化）是一个具有挑战性的问题。很少有数据集提供短语的地面真实性空间定位，因此希望在没有或很少有地面监督的情况下从数据中学习。我们提出了一种新颖的方法，该方法通过使用注意力机制来重构给定的短语来学习基础，该机制可以是潜在的也可以是直接优化的。在训练期间，我们的方法使用递归网络语言模型对短语进行编码，然后学习关注相关图像区域以重建输入短语。在测试时，评估正确的注意，即接地。如果有接地监督，则可以通过在关注机制上的损失直接应用。我们展示了我们的方法在Flickr 30k实体和ReferItGame数据集上的有效性，该数据集具有不同的监督级别，从无监督到部分监督到全面监督。与两个数据集上的最新技术相比，我们的监督变体有很大的提高。

著录项

作者
Rohrbach, Anna; Rohrbach, Marcus; Hu, Ronghang; Darrell, Trevor; Schiele, Bernt;
展开▼
作者单位

展开▼
年度 2017
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. Textual image compression: two-stage lossy/lossless encoding of textual images [J] . Witten I.H., Bell T.C. Proceedings of the IEEE . 1994,第6期

机译：文本图像压缩：文本图像的两阶段有损/无损编码
2. Phrase Mining and Machine Learning in Textual Data to Uncover Distinct Protein Patterns in Cardiovascular Disease [J] . David A. Liem, Vincent Kyi, Yu Shi, Journal of Molecular and Cellular Cardiology . 2017,第期

机译：短语挖掘和机器学习在文本数据中揭示心血管疾病中的不同蛋白质模式
3. A Key phrase-Based Tag Cloud Generation Framework to Conceptualize Textual Data [J] . Muhammad Abulaish, Tarique Anwar International journal of adaptive, resilient, and autonomic systems . 2013,第2期

机译：基于关键短语的标签云生成框架，可将文本数据概念化
4. Grounding of Textual Phrases in Images by Reconstruction [C] . Anna Rohrbach, Marcus Rohrbach, Ronghang Hu, European conference on computer vision . 2016

机译：图像中文本短语的重构
5. Focus and reconstruction effects in wh-phrases. [D] . Romero, Maribel. 1998

机译：短语的焦点和重建效果。
6. Extracellular Matrix in Cardiovascular Pathophysiology: Phrase mining of textual data to analyze extracellular matrix protein patterns across cardiovascular disease [O] . David A. Liem, Sanjana Murali, Dibakar Sigdel, -1

机译：心血管病理生理学中的细胞外基质：文本数据的短语挖掘可分析跨心血管疾病的细胞外基质蛋白模式
7. Grounding of Textual Phrases in Images by Reconstruction [O] . Rohrbach, A., Rohrbach, M., Hu, R., 2016

机译：重构图像中文本短语的接地

Grounding of Textual Phrases in Images by Reconstruction

摘要

著录项

相似文献

相关主题

期刊订阅